-
Notifications
You must be signed in to change notification settings - Fork 7
Add script that automatically updates postgres databases to a later version #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3959/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : update-postgres-script DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca
|
birdhouse/scripts/update-postgres.sh
Outdated
| #!/usr/bin/env sh | ||
|
|
||
| # This script updates all postgres databases that are used by components in this repository. | ||
| # This includes magpie and all WPS birds that use the postgres component. | ||
| # This does not include test component like optional-components/generic_bird and will not update | ||
| # custom components (ones not from this repository). | ||
| # | ||
| # It will update postgres databases to the version specified by the POSTGRES_VERSION_UPDATE | ||
| # environment variable. | ||
| # All of the old database files will be copied to a temporary directory in case you want to inspect | ||
| # them or revert this operation later on. To specify which directory to write these backups to | ||
| # set the DATA_BACKUP_DIR variable (default: ${TMPDIR:-/tmp}/birdhouse-postgres-migrate-backup/) | ||
| # Note that backups in the form of database dumps will also be written to the named volume or directory | ||
| # specified by the BIRDHOUSE_BACKUP_VOLUME variable. | ||
| # | ||
| # For example, to update the current postgres databases to version 18.1 and write backups to /tmp/test/ | ||
| # | ||
| # $ POSTGRES_VERSION_UPDATE=18.1 DATA_BACKUP_DIR=/tmp/test/ ./update-postgresh.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional note to provide might be regarding the "types" of supported "postgres".
For example, postgis-based one for stac-db will not work because of the extra plugins and definitions. Their migration must be done separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugins aren't actually an issue because they should be included in whichever new image we upgrade to. The actual reason is because the other ones (like stac) need data migrations as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't restic backup doing a db dump? The new DB should have all the same tables so isn't that a DB migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily that simple. Dumping a database with pg_dump is almost always going to be backwards compatible with previous versions of postgres but it's not necessarily always going to be forwards compatible with a later version.
Depending on the type of data, indexes, etc. you could run into a situation where a dump can't easily be ported to a newer version. Magpie and the WPS birds don't have any content in their databases that could cause those issues. But we can't just assume that this script will work for any postgres database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I over-simplified my comment, but essentially what you mentioned about forward-compatibility is what I was referring to, notably for the geometry types and functions as well as the special ones STAC define (eg: their collection_search function). They could work depending on versions, or could break if a major change is introduced. Indeed, the Magpie/WPS birds are basic enough that we don't have to worry about those.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3961/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : update-postgres-script DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca
|
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat little script. Will come in handy one day for sure.
Usage question: the intended workflow is to run this script, then immediately checkout a newer version of birdhouse-deploy that has the newer postgres version? Otherwise the temporary setting of MAGPIE_POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} will not last. Should this intended workflow be documented?
Also I tried to see if it can also support generic_bird so it seems pretty easy, except the part about generic_bird using a data-volume instead of a volume-mount from disk. Did I forget other details to support generic_bird. We actually use generic_bird in production to provide a 2nd instance of Finch (finchasync https://pavics.ouranos.ca/twitcher/ows/proxy/finchasync?service=WPS&version=1.0.0&request=GetCapabilities).
If I miss any steps, please let me know so I'll come back to this PR the day need to migrate generic_bird.
Other minor suggestions, none blocking.
birdhouse/scripts/update-postgres.sh
Outdated
| #!/usr/bin/env sh | ||
|
|
||
| # This script updates all postgres databases that are used by components in this repository. | ||
| # This includes magpie and all WPS birds that use the postgres component. | ||
| # This does not include test component like optional-components/generic_bird and will not update | ||
| # custom components (ones not from this repository). | ||
| # | ||
| # It will update postgres databases to the version specified by the POSTGRES_VERSION_UPDATE | ||
| # environment variable. | ||
| # All of the old database files will be copied to a temporary directory in case you want to inspect | ||
| # them or revert this operation later on. To specify which directory to write these backups to | ||
| # set the DATA_BACKUP_DIR variable (default: ${TMPDIR:-/tmp}/birdhouse-postgres-migrate-backup/) | ||
| # Note that backups in the form of database dumps will also be written to the named volume or directory | ||
| # specified by the BIRDHOUSE_BACKUP_VOLUME variable. | ||
| # | ||
| # For example, to update the current postgres databases to version 18.1 and write backups to /tmp/test/ | ||
| # | ||
| # $ POSTGRES_VERSION_UPDATE=18.1 DATA_BACKUP_DIR=/tmp/test/ ./update-postgresh.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't restic backup doing a db dump? The new DB should have all the same tables so isn't that a DB migration?
| : ${POSTGRES_VERSION_UPDATE:?"$(log ERROR "POSTGRES_VERSION_UPDATE must be set")"} | ||
|
|
||
| DATA_BACKUP_DIR="${DATA_BACKUP_DIR:-"${TMPDIR:-/tmp}"/birdhouse-postgres-migrate-backup/}" | ||
| POSTGRES_COMPONENTS="-a magpie -a $(birdhouse -q configs -c 'echo $POSTGRES_DATABASES_TO_CREATE' | sed 's/ / -a /g')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I want to hack this so it also handle generic_bird, all I need to is temporary add generic_bird to POSTGRES_DATABASES_TO_CREATE?
| mkdir -p ${DATA_BACKUP_DIR} | ||
| mv "${MAGPIE_PERSIST_DIR}" "${DATA_BACKUP_DIR}" | ||
| mv "${POSTGRES_DATA_DIR}" "${DATA_BACKUP_DIR}" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah crap, generic_bird is a data_volume, not a folder mount !
| mv "${MAGPIE_PERSIST_DIR}" "${DATA_BACKUP_DIR}" | ||
| mv "${POSTGRES_DATA_DIR}" "${DATA_BACKUP_DIR}" | ||
|
|
||
| MAGPIE_POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} ${BIRDHOUSE_EXE} compose up -d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm will need GENERIC_BIRD_POSTGRES_IMAGE=${POSTGRES_VERSION_UPDATE} here as well.
| If you are satisfied that the databases have been updated properly please add the following to your local environment file: | ||
|
|
||
| export MAGPIE_POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} | ||
| export POSTGRES_VERSION=${POSTGRES_VERSION_UPDATE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put these export before the ${BIRDHOUSE_EXE} compose up -d above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of a log message
You could immediately checkout a newer version of birdhouse or if you don't want to do that you can set the postgres versions in the local environment file (this is documented at the end of the script as a log message to the user).
The issue with the data volume shouldn't cause too many problems except that you wouldn't be able to backup the postgres files in the same way (you'd still have the db_dump backup though). However, it's different enough that I think it's going to seriously complicate this script to handle this other case. You could always manually update it with: birdhouse backup create --no-restic -a generic_bird
birdhouse compose down
docker volume rm birdhouse_postgres_generic_bird
GENERIC_BIRD_POSTGRES_IMAGE=postgres:18.1 birdhouse compose up -d
birdhouse backup restore --no-restic -a generic_birdWhy don't we document this instead of over-complicating this script? |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3966/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : update-postgres-script DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca
|
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3976/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : update-postgres-script DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/613/NOTEBOOK TEST RESULTS |
Yes documentation is perfectly fine. Not everyone uses I added a step to backup the data volume before deleting it. |
|
I'm not bumping the version before merge because I'm including other open PRs in the version bump. |
Overview
In anticipation of upgrading postgres databases in the future, this introduces a script that automatically upgrades postgres databases using the backup/restore process.
This includes magpie and all WPS birds that use the postgres component. This does not include test component like
optional-components/generic_birdand will not update custom components (ones not from this repository).Test components are not assumed to have persistent data that needs to be updated and we cannot guarantee that other postgres databases used by components outside this repository do not require additional steps (data migrations) in order to comply with a different version of postgres.
It will update postgres databases to the version specified by the
POSTGRES_VERSION_UPDATEenvironment variable.All of the old database files will be copied to a temporary directory in case you want to inspect them or revert this operation later on. To specify which directory to write these backups to set the
DATA_BACKUP_DIRvariable (default:${TMPDIR:-/tmp}/birdhouse-postgres-migrate-backup/)Note that backups in the form of database dumps will also be written to the named volume or directory specified by the
BIRDHOUSE_BACKUP_VOLUMEvariable.For example, to update the current postgres databases to version 18.1 and write backups to
/tmp/test/In a future update we can update the postgres versions and tell users to run this script first in order to safely migrate data from one version to the next.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
I have tested this by upgrading postgres databases from the current version (9.6) to the latest version (18.1) and all data in the magpie and WPS bird databases are not affected.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false